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Abstract 

We present methods that generate cooperative strategies for multi- 
vehicle control problems using a decomposition approach. By introducing 
a set of tasks to be completed by the team of vehicles and a task execution 
method for each vehicle, we decomposed the problem into a combinatorial 
component and a continuous component. The continuous component of 
the problem is captured by task execution, and the combinatorial compo- 
nent is captured by task assignment. In this paper, we present a solver 
for task assignment that generates near-optimal assignments quickly and 
can be used in real-time applications. To motivate our methods, we apply 
them to an adversarial game between two teams of vehicles. One team is 
governed by simple rules and the other by our algorithms. In our study of 
this game we found phase transitions, showing that the task assignment 
problem is most difficult to solve when the capabilities of the adversaries 
are comparable. Finally, we implement our algorithms in a multi-level ar- 
chitecture with a variable replanning rate at each level to provide feedback 
on a dynamically changing and uncertain environment. 

1 Introduction 

Using a team of vehicles to accomplish an objective can be effective for problems 
involving a set of tasks distributed in space and time. Examples of such prob- 
lems include multi-target intercept , terrain mapping [21] 1 reconnaissance j2H] , 
and surveillance [3^1 . To achieve effective solutions, in general, a vehicle team 
needs to follow a cooperative policy. The generation of such a policy has been 
the subject of a rich literature in cooperative control. A sample of the note- 
worthy work in this field includes a language for modeling and programming 
cooperative control systems |18|. receding horizon control for multi- vehicle sys- 
tems [5], non-communicative multi-robot coordination |21| . hierarchical meth- 
ods for target assignment and intercept pQ, cooperative estimation for reconnais- 
sance problems |28| , mixed integer linear programming methods for cooperative 
control |33U13| . the compilation on multi-robots in dynamics environments \22\ . 
and the compilation on cooperative control and optimization 26 . 
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When multi-vehicle teams operate in dynamically changing and uncertain 
environments, which is often the model predictive approach [231 can 

be used to provide feedback. This approach involves frequently recomputing 
the team control policy in real-time. However, because these systems are often 
hybrid dynamical systems, computing a cooperative policy is often computa- 
tionally hard. The challenge addressed in this paper is to (1) develop a method 
to generate near-optimal cooperative policies quickly and (2) to effectively im- 
plement the method. 

In our previous work on cooperative control |131 111) we developed mixed 
integer linear programming methods because of their expressiveness and ease 
of modeling many types of problems. The drawback is that real-time planning 
is infeasible because of the computational complexity of the approach. This 
motivated us to develop a trajectory primitive decomposition approach to the 
problem. This approach finds near-optimal solutions quickly, allowing real-time 
implementation, and can be tuned to balance the tradeoff between optimality 
and computational effort for the particular problem at hand. The drawback, 
compared to our previous work, is that it is limited to cooperative control prob- 
lems in which vehicle tasks can be clearly defined and efficient primitives exist. 

In this paper, we present our trajectory primitive decomposition approach. 
We analyze the average case behavior of the approach by solving instances of a 
cooperative control problem derived from Cornell's RoboFlag environment. And 
finally, we implement the approach in a hierarchical architecture with variable 
replanning rates at each level and test the implementation in a dynamically 
changing and uncertain RoboFlag environment. 

The trajectory primitive decomposition involves the introduction of a set of 
tasks to be executed by the vehicles, allowing the problem to be separated into a 
low-level component, called task execution, and a high-level component, called 
task assignment. The task execution component is formulated as an optimal 
control problem, which explicitly involves the vehicle dynamics. Given a vehicle 
and a task, the goal is to find the control inputs necessary to execute the given 
task in an optimal way. The task assignment component is an NP-hard |14j 
combinatorial optimization problem. The goal is to assign a sequence of tasks to 
each vehicle so that the team objective is optimized. Task assignment does not 
explicitly involve the vehicle dynamics because the task execution component 
is utilized as a trajectory primitive. 

We have developed a branch and bound algorithm to solve the task assign- 
ment problem. One of the benefits of this algorithm is that it can be stopped 
at any time in the solution process and the output is the best feasible assign- 
ment found in that time. This is advantageous for real-time applications where 
control strategies must be generated within a time window. In this case, the 
best solution found in the time window is used. Another advantage is that the 
algorithm is complete; given enough time, it will find the optimal solution. 

To analyze the average case performance of the branch and bound solver, we 
generate and solve many instances of the problem. We look at computational 
complexity, convergence to the optimal assignment, and performance variations 
with parameter changes. We found that the solver converges to the optimal 
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assignment quickly. However, the solver takes much more time to prove the 
assignment is optimal. Therefore, if the solver is terminated early, the best 
feasible assignment found in that time is likely to be a good one. We also found 
several phase transitions in the task assignment problem, similar to those found 
in the pioneering work |2(JI 1341 125) . At the phase transition point, the task 
assignment problem is much harder to solve. For cooperative control problems 
involving adversaries, the transition point occurs when the capabilities of the two 
teams are comparable. This behavior is similar to the complexity of balanced 
games like chess |T5] . 

Finally, we implement the methods in a multi-level architecture with re- 
planning occurring at each level, at different rates (multi-level model predictive 
control). The motivation is to provide feedback to help handle dynamically 
changing environments. 

The paper is organized as follows: In Section we state the multi- vehicle 
cooperative control problem and introduce the decomposition. In Section we 
introduce the example problem used to motivate our approach. In Section 
we describe our solver for the task assignment problem, and in Section [S] we 
analyze its average case behavior. Finally, in Section H3 we apply our solver in 
a dynamically changing and uncertain environment using a multi-level model 
predictive control architecture for feedback. A web page that accompanies this 
paper can be found at [TU], 

2 Multi- vehicle task assignment 

The general multi-vehicle cooperative control problem consists of a heteroge- 
neous set of vehicles (the team), an operating environment, operating con- 
straints, and an objective function. The goal is to generate a team strategy 
that minimizes the objective function. The strategy in its lowest level form is 
the control inputs to each vehicle of the team. 

In w e show how to solve this problem using hybrid systems tools. 

This approach is successful in determining optimal strategies for complex multi- 
vehicle problems, but becomes computationally intensive for large problems. 
Motivated to find faster techniques, we have developed a decomposition ap- 
proach described in this paper. 

The key to the decomposition is to introduce a relevant set of tasks for the 
problem being considered. Using these tasks, the problem can be decomposed 
into a task completion component and a task assignment component. The task 
completion component is a low level problem, which involves a vehicle and a 
task to be completed. The task assignment component is a high level problem, 
which involves the assignment of a sequence of tasks to be completed by each 
vehicle in the team. 

Task Completion: Given a vehicle, an operating environment with con- 
straints, a task to be completed, and an objective function, find the control 
inputs to the vehicle such that the constraints are satisfied, the task is com- 
pleted, and the objective is minimized. 
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Figure 1 : The framework for the task assignment problem using task completion 
primitives. 

Task Assignment: Given a set of vehicles, a task completion algorithm for 
each vehicle, a set of tasks to be completed, and an objective function, assign a 
sequence of tasks to each vehicle such that the objective function is minimized. 

In the task assignment problem, instead of varying the control inputs to 
the vehicles to find an optimal strategy, we vary the sequence of tasks assigned 
to each vehicle. This problem is a combinatorial optimization problem and 
does not explicitly involve the dynamics of the vehicles. However, in order to 
calculate the objective function for any particular assignment, we must use the 
task completion algorithm. Task completion acts as a primitive in solving the 
task assignment problem, as shown by the framework in Figure^ Using the low 
level component (task completion) , the high level component (task assignment) 
need not explicitly consider the detailed dynamics of the vehicles required to 
perform a task. 

3 RoboFlag Drill 

To motivate and make concrete our decomposition approach, we illustrate the 
approach on an example problem derived from Cornell's multi-vehicle system 
called RoboFlag. For an introduction to RoboFlag, see the papers from the 
invited session on RoboFlag in the Proceedings of the 2003 American Control 
Conference 0IH1 • In d§| > protocols for the RoboFlag Drill are analyzed using 
a computation and control language. 

The RoboFlag Drill involves two teams of vehicles, the defenders and the 
attackers, on a playing field with a circular region of radius Rdz at its center 
called the Defense Zone (Figure^- The attackers' objective is to fill the Defense 
Zone with as many attackers as possible. They have a fixed strategy in which 
each moves toward the Defense Zone at constant velocity. An attacker stops if 
it is intercepted by a defender or if it enters the Defense Zone. The defenders' 
objective is to deny as many attackers as possible from entering the Defense 
Zone without entering the zone themselves. A defender denies an attacker from 
the Defense Zone by intercepting the attacker before it reaches the Defense Zone. 

The wheeled vehicles of Cornell's RoboCup Team are the defenders in 

the RoboFlag Drill problem we consider in this paper. Each vehicle is equipped 
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Figure 2: The RoboFlag Drill used to motivate the methods presented is this 
paper. The drill takes place on a playing field with a Defense Zone at its 
center. The objective is to design a cooperative control strategy for the team 
of defending vehicles (black) that minimizes the number of attacking vehicles 
(white) that enter the Defense Zone. 

with a three-motor omni-directional drive that allows it to move along any 
direction irrespective of its orientation. This allows for superior maneuverability 
compared to traditional nonholonomic (car-like) vehicles. A local control system 
on the vehicle, presented in |22| and Appendix alters the dynamics so that 
at a higher level of the hierarchy, the vehicle dynamics are governed by 

x{t)+x(t) = u x {t) 

y{t)+y{t) = u y {t) 

u x {tf +u y (t) 2 <1. (1) 

The state vector is x = {x, y, x, y), and the control input vector is u = {u Xl u y ). 
These equations are less complex than the nonlinear governing equations of 
the vehicles. They allow for the generation of feasible near-optimal trajectories 
with little computational effort and have been used successfully in the RoboCup 
competition. 

Each attacker has two discrete modes: active and inactive. When active, the 
attacker moves toward the Defense Zone at constant velocity along a straight 
line path. The attacker, which is initially active, transitions to inactive mode 
if the defender intercepts it or if it enters the Defense Zone. Once inactive, the 
attacker does not move and remains inactive for the remainder of play. These 
dynamics are captured by the discrete time equations 

p[k + l] =p[k] +v p Ta[k] 
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(in Defense Zone) _ 
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otherwise any 

Figure 3: The two state (active and inactive) attacker state machine. The 
attacker starts in the active state. It transitions to the inactive state, and 
remains in this state, if it enters the Defense Zone or if it is intercepted by a 
defender. 



q[k + 1] = q[k] + v q Ta[k] 
and the state machine (see Figure |3J) 



(2) 



a[k + l] 



if (a[k] = 1) 

and (not in Defense Zone) 
and (not intercepted) 
if {a[k\ = 0) 
or (in Defense Zone) 
or (intercepted) 



(3) 



for all k in the set {1, . . . , N a }. The initial conditions are 
p[0] = p 8 , q[0] = q s , and o[0] = 1. 
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In these equations, N a is the number of samples, T is the sample time, (p[fc], q[k]) 
is the attacker's position at time t a [k] = kT, (v p ,v q ) is its constant velocity 
vector, and a[k] € {0, 1} is a discrete state indicating the attacker's mode. The 
attacker is active when a[k] = 1 and inactive when a[k] = 0. Given (p[k], q[k]) 
and a[k], we can calculate the attacker's position at any time t, denoted p(t) = 
(p(t) , <l(t)) , using the equations 



p(t) = p[k] + v p a[k](t - t a [k}) 
q(t) = q[k] + v q a[k)(t - t a [k]), 



(5) 



where k = [t/T\. 

Because the goal of the RoboFlag Drill is to keep attackers out of the Defense 
Zone, attacker intercept is an obvious task for this problem. Therefore, the task 
completion problem for the RoboFlag Drill is an intercept problem. 

RoboFlag Drill Attacker Intercept (RDAI): Given a defender with state x(t) 
governed by equation Q with initial condition x(0) = x s) an attacker governed 
by equations J2J and © with initial conditions given by Q and coordinates p(i) 
given by equation (jSJ, obstacles and restricted regions to be avoided, time depen- 
dent final condition x(i/) = (p(tf), q(tf),0, 0), and objective function Jtc = tf, 
find the control inputs to the defender that minimize the objective such that 
the constraints are satisfied. 
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The operating environment includes the playing field and the group of at- 
tacking vehicles. The operating constraints include collision avoidance between 
vehicles and avoidance of the Defense Zone (for the defending robots). 

Next, we define notation for a primitive that generates a trajectory solving 
the RDAI problem. The inputs to the primitive are the current state of defender 
d, denoted Xd(t), and the current position of attacker a, denoted p a {t)- The 
output is the amount of time it takes defender d to intercept attacker a, denoted 
At int (d,a,t), given by 

At int (d,a,t) := intTime[x d (t),p a (f)]. (6) 

If defender d can not intercept attacker a before the attacker enters the Defense 
Zone, we set Ati nt (d,a,t) := oo. 

Near-optimal solutions to the RDAI problem can be generated using the 
technique presented in [23 with straightforward modification. The advantage 
of this technique is that it finds very good solutions quickly, which allows for 
the exploration of many trajectories in the planning process. Another way to 
generate near-optimal solutions for RDAI is to use the iterative mixed inte- 
ger linear programming techniques presented in |12| . The advantage of this 
approach is that it can handle complex hybrid dynamical systems. Either of 
these approaches could be used as a primitive for the RDAI problem. Using the 
primitive, the RoboFlag Drill problem can be expressed as the following task 
assignment problem: 

RoboFlag Drill Task Assignment (RDTA): Given a team of defending vehi- 
cles T> = {di, . . . , g?„}, a set of attackers to intercept A = {ai, . . . , a rn }, initial 
conditions for each defender d and for each attacker a, an RDAI primitive, and 
an objective function J, assign each defender d in D a sequence of attackers to 
intercept, denoted ad, such that the objective function is minimized. 

We introduce notation (listed in Table ^) to describe the cost function J 
and the algorithm that solves the RDTA problem. Let nid be the number 
of attackers assigned to defender d, and let ad = (a<j(l), . . . ,ay(md)) be the 
sequence of attackers defender d is assigned to intercept. Let £ d (i) be the time 
at which defender d completes the ith task in its task sequence ad- Let A u be 
the set of unassigned attackers, then A — A u is the set of assigned attackers. 

An assignment for the RDTA problem is an intercept sequence ad for each 
defender d in T>. A partial assignment is an assignment such that A u is not 
empty, and a complete assignment is an assignment such that A u is empty. 

The set of times {td{i) ■ i = 1, ■ ■ • ,m4i f° r each defender d, are computed 
using the primitive in equation ©. The time at which defender d intercepts 
the ith attacker in its intercept sequence, if not empty, is given by 

t (i) = I if At mt (d,a d (i),td(i - 1)) = oo 

d{V \ t d (i - 1) + At mt (d, a d {i), t d {i - 1)), otherwise ' 

where we take td(0) = 0. If defender d can not intercept attacker ad(i) before 
the attacker enters the Defense Zone, the time td{i) is not incremented because, 
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Tabic 1: Variables for RoboFlag Drill problems 



n 


number of dciendmg vehicles 


m 


number of attacking vehicles 


V 


j_l j_ C 1 f* 1* 1*1 

the set ol defending vehicles 


A 


the set ol attacking vehicles 


Au 


the set of unassigncd attacking vehicles 


A Au 


the set of assigned attacking vehicles 


Xrf(t) 


the state of defender d at time t 


Pa(t) 


the position of attacker a at time t 


Old 


the sequence of attackers assigned to defender d 


m d 


the length of defender rf's intercept sequence ad 


At int (d,a,t) 


time needed for d to intercept a starting at time t. 


t d (i) 


the time that d completes iih task in task sequence ad 


7a 


binary variable indicating if a enters the Defense Zone 


J 


the cost function for the RDTA problem 


e 


weight in the cost function J 



in this case, the defender docs not attempt to intercept the attacker. The time 
at which defender d completes its intercept sequence ad is given by td{rtid)- 

To indicate if attacker a enters the Defense Zone during the drill, we intro- 
duce binary variable 7 a given by 

__ J 1 if attacker a enters Defense Zone 
la = \ otherwise. [ ' 

If 7 a = 1, attacker a enters the Defense Zone at some time during play, otherwise, 
7 a = and attacker a is intercepted. We compute 7 a for each attacker a in the 
set of assigned attackers (A — A u ) as follows: For each d in D and for each i in 
{1, . . . , Too-}, if Ati n t(d, ad(i), td(i — 1)) = oo then set J ad (i) = 1j otherwise set 

la d (t) = 0. 

For the RDTA problem, the cost function has two components. The primary 
component is the number of assigned attackers that enter the Defense Zone 
during the drill, 

Ji= E -r°- ( 8 ) 

a£(A-A u ) 

The secondary component is the time at which all assigned attackers that do 
not enter the Defense Zone (all a such that 7 a = 0) are intercepted, 



J 2 = max t d (md). (9) 

d£V 



The weighted combination is 



J = 7o + em&xt d (m d ), (10) 

a£(A-A u ) 
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where we take < e -C 1 because we want the primary component to dominate. 
In particular, keeping attackers out of the Defense Zone is most important. 
Therefore, our goal in the RDTA problem is to generate a complete assignment 
{A u empty) that minimizes equation i|l(J|) . 

4 Branch and bound solver 

One way to find the optimal assignment for RDTA is by exhaustive search; try 
every possible assignment of tasks to vehicles and pick the one that minimizes J. 
This approach quickly becomes computationally infeasible for large problems. 
As the number of tasks or the number of vehicles increase, the total number 
of possible assignments grows significantly. A more efficient solution method 
is needed for real-time planning. With this motivation, we developed a branch 
and bound solver for the problem. In this section, we describe the solver and 
its four major components: node expansion, branching, upper bound, and lower 
bound. 

We use a search tree to enumerate all possible assignments for the problem. 
The root node represents the empty assignment, all interior nodes represent 
partial assignments, and the leaves represent the set of all possible complete as- 
signments. Given a node representing a partial assignment, the node expansion 
algorithm (Section l4.1fl generates the node's children. Using the node expansion 
algorithm, we grow the search tree starting from the root node. The branch- 
ing algorithm ( Section I4.2|) is used to determine the order in which nodes are 
expanded. In this algorithm, we use A* search to guide the growth of the 
tree toward good solutions. 

Given a node in the tree representing a partial assignment, the upper bound 
algorithm ( Section [4. 3fl assigns the unassigned attackers in a greedy way. The 
result is a feasible assignment. The cost of this assignment is an upper bound on 
the optimal cost that can be achieved from the given node's partial assignment. 
The upper bound is computed at each node explored in the tree (not all nodes 
are explored, many are pruned). As the tree is explored, the best upper bound 
found to date is stored in memory. 

Given a node in the search tree representing a partial assignment, the lower 
bound algorithm ( Sect ion 14.4(1 assigns the unassigned attackers in A using the 
principle of simultaneity. Each defender is allowed to pursue multiple attackers 
simultaneously. Because this is physically impossible, the resulting assignment 
is potentially infeasible. Because no feasible assignment can do better, the cost 
of this assignment is a lower bound on the cost that can be achieved from the 
given node's partial assignment. Similar to the upper bound, the lower bound 
is computed at each node explored in the tree. 

If the lower bound for the current node being explored is greater or equal 
to the best upper bound found, we prune the node from the tree, eliminating 
all nodes that emanate from the current node. This can be done because of 
the way we have constructed the tree. The task sequences that make up a 
parent's assignment are subsequences of the sequences that make up each child's 
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Table 2: Branch and bound algorithm 



1: Start with a tree containing only the root node. 

2: Run upper bound algorithm with root node's partial 
assignment (the empty assignment) as input, gener- 
ating a feasible complete assignment. 

3: Set J^l st to the cost the complete assignment. 

4: Expand the root node using expand node routine. 

5: while growing the tree do 

6: Use branching routine to pick next branch to ex- 
plore. 

7: Use upper bound algorithm to compute feasible 
complete assignment from current node's partial 
assignment, and set the cost of this assignment to 

J ub • 

8: if J ub < 41 st , set 41 st := J ub . 
9: Use lower bound algorithm to calculate the lower 
bound cost from the current node's partial assign- 
ment, and set this cost to Jib- 
10: if Jib > Jut st ' prune current node from the tree. 
11: end while 



assignment. Therefore, exploring the descendants will not result in a better 
assignment than that already obtained. 

Before we describe the details of the components, we describe the branch and 
bound algorithm listed in Table Start with the root node, which represents 
the empty assignment, and apply the upper bound algorithm. This generates 
a feasible assignment with cost denoted J^ st because it is the best, and only, 
feasible solution generated so far. Next, apply the node expansion algorithm to 
root, generating its children. 

At this point, enter a loop. For each iteration of the loop, apply the branch- 
ing algorithm to select the node to explore next. The node selected by the 
branching algorithm, which we call the current node, contains a partial assign- 
ment. Apply the upper bound algorithm to the current node, generating a 
feasible complete assignment with cost denoted J u b- If Jub is less than J„jj s *, 
we have found a better feasible assignment so we set J^ h st := J u b- Next, apply 
the lower bound algorithm to generate an optimistic cost, denoted Jib, from the 
current node's partial assignment. If Jib is greater than or equal to the best 
feasible cost found so far J'£ st , prune the node from the tree, removing all of 
its descendants from the search. We do not need to consider the descendants of 
this node because doing so will not result in a better feasible assignment than 
the one found already, with cost J^l st . The loop continues until all nodes have 
been explored or pruned away. The result is the optimal assignment for the 
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Figure 4: The solution to two instances of the RDTA problem using the branch 
and bound solver. The circle at the center of the field is the Defense Zone. 
The lines with asterisks denote the attacker trajectories, and the lines without 
denote defender trajectories. The parameters for these instances are e = 0.01, 
n = 3, and m = 6. 



RDTA problem. 

In Figure we plot the solution to two instances of the RDTA problem 
solved using the branch and bound solver. Notice that the defenders work 
together and do not greedily pursue the attackers that are closest. For example, 
in the figure on the left, defenders 2 and 3 ignore the closest attackers and 
pursue attackers further away for the benefit of the team. 

In the remainder of this section we describe the components of the branch 
and bound solver in detail. 

4.1 Node expansion 

Here we describe the node expansion algorithm used to grow a search tree 
that enumerates all possible assignments for the RDTA problem. Each node 
of the tree represents an assignment. Starting from the root node, attackers 
are assigned, forming new nodes, until all complete assignments are generated. 
Each node represents a partial assignment except for the leaves, which represent 
the set of complete assignments. 

Consider the case with one defender D = {di} and three attackers A = 
{<2i, a2, a 3 }. The tree for this case is shown in Figure|3] To generate this tree, 
we start from the root node representing the empty assignment, denoted (). We 
expand the root node generating three children, each representing an assignment 
containing a single attacker to intercept. The children are then expanded, and 
so on, until all possible assignments are generated. 

For multiple defenders, unlike the single defender case, the tree is unbalanced 
to avoid redundancies. For example, consider the case with two defenders T> — 
{c?i,d2} and two attackers A — {01,02}. The tree for this case is shown in 
Figure El Again, each node represents an assignment, but now the assignment 
is a sequence of attackers to intercept for each defender in T>. In general, for 



11 



(ai> 



(«2> 



<«3) 



(0,1,0,2) {0,1,03) (02,01) {a2,a :i ) (03,01) (03,02) 

I I I I I I 

(01,02,03) (01,03,02) (02,01,03) (02,03,01) (03,01,02) (o 3 ,o 2 ,ai) 



Figure 5: Search tree for the RDTA problem with the defender set V — {di} 
and the attacker set A — {en, a 2 , 013}. Each node of the tree denotes a sequence 
of attackers to be intercepted by defender d\. The root node is the empty 
assignment. The leaves of the tree give all possible complete assignments of 
attackers in A. 




Figure 6: Search tree for the RDTA problem with defender set V — {di,d 2 } 
and the attacker set A = {01,02}. Each node of the tree denotes a sequence 
of attackers to intercept for defender d\ and defender d 2 . The root node is the 
empty assignment. The leaves of the tree give all possible complete assignments 
of attackers in A. 
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defender set V with n defenders and attacker set A with to attackers there are 
(n + m— 1)1 /(n — 1)! complete assignments (or leaves in the search tree). 

To generate a search tree for the general case, we use a node expansion al- 
gorithm. This algorithm takes any node and generates the node's children. The 
assignment for each child is constructed by appending an unassigned attacker 
to one of the sequences in the parent node's assignment. The task sequences in 
the parent's assignment are always subsequences of the sequences in its child's 
assignment. Therefore, when we prune a node from the search tree, we can also 
prune all of its descendants. 

The node expansion algorithm uses a different representation for an as- 
signment than we have used thus far. We introduce this new representation 
with an example involving the defender set D = {di,d,2} and the attacker set 
A = {ai, a,2, ■ ■ ■ , aj}. Consider a partial assignment given by 

&di = ( fl 4, ai) 

a d 2 = ( a 2,a5, a 7)- 

In this case, attackers 03 and ae have yet to be assigned. Our node expansion 
algorithm represents this partial assignment with the vectors 

8= (1,1,2,2,2,0,0) 

13= (4,1,2,5,7,0,0), (11) 

both of length to = 7. Vector S holds defender indices and vector (3 holds 
attacker indices. For a unique representation, the elements in 6 are ordered so 
that 5(i) < S(i + 1). For the example case, attackers a@m and (1/312) (i.e., 0,4 
and a\) are assigned to defender d\ in sequence, and attackers 03(3), 0/3(4), a /3(5) 
(i.e., 0,2, &5, 07) are assigned to defender c?2 in sequence. 

In general, the input to the node expansion algorithm is a parent node with 
assignment give by 

parenU= (6(1), 5(2),...,%), 0,...,0) 

parent./? = (/3(1), /3(2), . . . , (3(p), 0, . . . , 0), (12) 

where both vectors are of size to, and p is the number of tasks already assigned 
(or the number of nonzero entries in each vector). The output is a set of N c hud 
children, where 

N chm = (n-8(p) + l){m-p). (13) 

Each child has assignment vectors 5 and /3 identical to its parent except for 
entries S(p + 1) and j3(p + 1). In the child's assignment, attacker ag(v, + i) is 
appended to defender d^( p+1 )'s sequence of attackers to intercept <Xs(p+i)- The 
details of the node expansion algorithm are given in Table |3| 

To demonstrate the node expansion algorithm, we expand the node given 
by equation (|11|) as shown in Figure Figure [S] shows the normal notation for 
this expansion. Using this algorithm, we can grow the assignment tree for any 
RDTA problem. In Figure [5] we show the tree for the two vehicle two attacker 
example written in our node expansion algorithm's notation. 
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Table 3: Node expansion al£ 


;orithm 


1 


k := 1 




2 


for i = 6(p), S(p + 1), . . . , n do 




3 


for each j in the set 


{{l,...,m} - 




{/?(l),/3(2),... ,/%)}} do 




4 


child(fc).<S = parent. 5 




5 


child (fc)./3 = parent./? 




6 


child(fc).£(p + 1) = i 




7 


child(fc)./3(p+ 1) = j 




8 


k := fc + 1 




9 


end for 




10 


end for 





<5 = (1,1,2,2,2,0,0) 
/3 = (4,1,2,5,7,0,0) 




(5 = (1, 1, 2, 2, 2, 2, 0) 5 = (1, 1, 2, 2, 2, 2, 0) 

/3 = (4, 1, 2, 5, 7, 3, 0) /3 = (4, 1, 2, 5, 7, 6, 0) 

Figure 7: The node from equation written in node expansion format, is 

expanded using the node expansion algorithm in Table 



c«di = (04,01) 

a d 2 = (a2,a 5 ,a 7 ) 




(*di = ( a 4, a l) Q d! = (a4,ai> 

"d 2 = («2, C15, 17, 13) £»d 2 = (12,15,07,03,06) 

Figure 8: The expansion in Figure in our original notation. 
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Figure 9: The tree from Figure El written using our node expansion algorithm's 
notation. 




Figure 10: Example search tree used to illustrate the branching routine in the 
branch and bound solver. 



4.2 Search algorithm 

To determine the order in which we expand nodes, we have tried several tree 
search algorithms including the systematic search algorithms breadth first search 
(BFS), depth first search (DFS) [5], and A* search The A* search algorithm 
orders nodes according to a heuristic branching function to help guide the search 
toward the optimal assignment. We use the upper bound algorithm presented 
in Section [4.31 as the branching function. The lower bound algorithm presented 
in Section IPI could also be used as the branching function. 

For example, consider a tree with three levels, where node i is labeled n^ as 
shown in FigurelTUl For this tree, BFS gives the ordering (no, n\, n 2 , n^, n^, 715, uq, n 7 , n%, ng), 
and DFS gives the ordering (no, n\, 714, 715, n^, tiq, 117, 713, ng, rig). Suppose the 
upper bound algorithm run at each node i gives the following results: J u b(ni) = 
3, J u b(n 2 ) = 1, J u b(n 3 ) = 2, J„&(n 4 ) = 2, J u b(n 5 ) = 1, J u b(n 6 ) = 1, J u b(n 7 ) = 1, 
Jub(n s ) = 1, J u b(n 9 ) = 0. A* BFS gives the ordering (n , n 2 , n 3 , ni, n 6 , n 7 , n 9 , n 8 , n 5 , n 4 ), 
and A* DFS gives the ordering (no, uq, tit, n^, ng, ng, n%, 115,114). 

In A* search, the children of a node must be sorted with respect to the 
branching function. The maximum number of children that emanate from any 
given node is the nm children emanating from the root node. Therefore, the 
maximum number of items that need to be sorted is nm. To sort the children, 
we use Shell's method which runs in 0((nm) 3 / 2 ) time. 



15 



4.3 Upper bound algorithm 

In this section, we describe a fast algorithm that generates a feasible complete 
assignment given any partial assignment. The cost of the resulting complete 
assignment is an upper bound on the optimal cost that can be achieved from 
the given partial assignment. The idea behind the upper bound algorithm is 
to assign unassigned attackers in a greedy way. At each step, we assign the 
attacker defender pair that results in the minimum intercept time. We proceed 
until all attackers are assigned or until none of the remaining attackers can be 
intercepted before entering the Defense Zone. The details of this algorithm, 
which runs in 0(nm 2 ) time, are listed in Tabled 

The input to the algorithm is a partial assignment given by an intercept 
sequence a<j for each defender d in V such that the set of unassigned attackers 
A u is not empty. In addition, we take as inputs the variables associated with this 
partial assignment including the time for defender d to complete its intercept 
sequence ad, given by td{md), and binary variable 7 a for each a in the set of 
assigned attackers A — A u . 

Given a partial assignment, the greedy step of the algorithm determines the 
attacker in the set A u that can be intercepted in the minimum amount of time, 
denoted a*. The corresponding defender that intercepts a* is denoted d*. To 
determine this defender, attacker pair (d* , a*) we form a matrix C of intercept 
times. The matrix has size \T>\ x \A U \, and its elements are given by 

c(d,a) := t d (m d ) + At int (d,a,td(m d )), (14) 

for each d in D and a in A u . The element c(d,a) is the time it would take 
defender d to complete its intercept sequence ad and then intercept attacker a. 
The minimum of these times gives the desired defender, attacker pair 

c(d*,a*)= min c(d,a) 

If c(d* , a*) =00, no attacker can be intercepted before it enters the Defense 
Zone. Thus, we set j a '■= 1 for each a in A u . Then, we set A u to the empty 
set because all attackers are effectively assigned, and we use equation lfTU|) to 
calculate the upper bound J u b- Otherwise, c(d*,a*) is finite, and we add attacker 
a* to defender d*'s intercept sequence by incrementing ma* by one and setting 
Q^d* {md* ) := a*. Then, because a* has now been assigned, we remove it from 
the set of unassigned attackers by setting A u := A u — {a*}. If A u is not empty, 
we have a new partial assignment, and we repeat the procedure. Otherwise, the 
assignment is complete and we use equation H10|) to compute the upper bound 

Jub- 

4.4 Lower bound algorithm 

Here we describe a fast algorithm that generates a lower bound on the cost 
that can be achieved from any given partial assignment. The idea behind the 
algorithm is to use the principle of simultaneity. In assigning attackers from 

1G 
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Table 4: Greedy upper bound algorithm 



1: Given a partial assignment: intercept sequence ad for 
each d £ T>, a nonempty set of unassigned attackers 
A u , td{md) for each d G T>, and 7 a for each a e 



2: Initialize variables for unassigned attackers. Set 

-f a := for each a in ^4„. 
3: Calculate the elements of matrix C. For all d £ T> 

and a e .4 U , set 

c(d, a) := id(md) + At int {d, a, t d {m d )). 

4: while A u not empty do 

5: Find minimum element of C given by 



6: If c(d*,a*) — oo, no attacker in the set A u can 

be intercepted before entering the Defense Zone. 

Break out of the while loop. 
7: Append attacker a* to defender d* 's assignment by 

setting md* ■= rrid* + 1 and ad* {md*) := a*. 
8: Update finishing time for cT by setting td* {md* ) ■— 

c(d* , a*). 

9: Remove a* from consideration since it has been 
assigned. Set c(d, a*) to oo for all d e X>, and set 
. {ft } . 

10: Update matrix for defender d* . For all attackers 
a e A u , set 

c(d*,a) := t d *(m d *) + At int (d* , a, t d * (m d .)). 
11: end while 

12: For each o in ^4„, set 7 a := 1. 
13: Set 



{A A u \ 



c(d*,a*) 



mm 



c(d, a). 
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A Ul we assume each defender can pursue multiple attackers simultaneously. 
The result is a potentially infeasible complete assignment because simultaneity 
is physically impossible. Because no feasible assignment can do better, the cost 
of this assignment is a lower bound on the optimal cost that can be achieved 
from the given partial assignment. The algorithm, which runs in 0(nm) time, 
is listed in Table El 

Similar to the upper bound algorithm, the input to the lower bound algo- 
rithm is a partial assignment. This includes an intercept sequence ay for each 
defender d in T> with A u nonempty, td(rrid) for each defender d in T>, and "f a for 
each attacker a in A — A u ■ 

Each attacker a in A u is assigned a defender as follows: Form a matrix C 
with elements 

c(d,a) := t d (m d ) + At int (d,a,t d (m d )), (16) 

for all d in T> and a in A u . Element c(d, a) is equal to the time it takes d to 
intercept the attackers in its intercept sequence a d plus the time it would take to 
subsequently intercept attacker a. For each a in A u , find the defender, denoted 
d* , that can intercept a in minimal time 

c(d* , a) = mm c(d, a). (17) 

d£T> 

If c(d* , a) = oo, we set 7 a := 1 because no defender can intercept attacker a 
before it enters the Defense Zone. Otherwise, we set -f a := because defender 
d* can intercept attacker a before it enters the Defense Zone. The lower bound 
is therefore give by 

Ji 6 :=y^7a + e max I min c(d, a) I . (18) 



5 Analysis of the solver 

In this section, we explore the average case computational complexity of the 
branch and bound algorithm by solving randomly generated instances. Each 
instance is generated by randomly selecting parameters from a uniform distri- 
bution over the intervals defined below. The computations were performed on a 
PC with Intel PHI 550MHz processor, 1024KB cache, 3.8GB RAM, and Linux. 
For all instances solved, processor speed was the limiting factor, not memory. 

5.1 Generating random instances 

The initial position of each attacker is taken to be in an annulus centered on 
the playing field. The radius of the initial position, denoted r a , is chosen at 
random from a uniform distribution over the interval [r" lm , ?*™ ax ]- The angle of 
the initial position, denoted 9 a , is chosen from a uniform distribution over the 
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Table 5: Lower bound algorithm 



1: Given a partial assignment: intercept sequence a d for 
each d € T>, a nonempty set of unassigned attackers 
Au, torrid) for each d e T>, and j a for each a e 



2: Calculate the elements of matrix C. For all d E V 
and a e .4 U , set 

c(d, a) := £ d (m d ) + At int (d, a, t d (m d )). 

3: for all a e do 

4: Find minimum clement of ath column of C given 



by 



c(gP, a) 



mincfd, a). 



5: ifc(d*,a) 

6: else set 7, 

7: end for 

8: Set 



a 



= 0. 



00 then set 7 a := 1. 
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interval (0, 2tt] (all other angles used in this section <j> a , 6d, and (f>d are also chosen 
from a uniform distribution over the interval (0, 2w\). The magnitude of attacker 
a's velocity, denoted v a , is chosen at random from a uniform distribution over 
the interval [w™ m ,^™ ax ]- The initial state of the attacker is given by 

p(0) = r a cos(8 a ), q(0) = r a sm(9 a ) 

p = v a cos((f> a ), q = v a sm((j> a ). (19) 

The initial position of each defender is taken to be in a smaller annulus, also 
centered on the playing field. The radius of the initial position, denoted rd, 
is chosen at random from a uniform distribution over the interval [r™ m , r™ ax ] . 
The magnitude of defender d's velocity, denoted Vd, is chosen at random from 
a uniform distribution over the interval [f™ m , w™ ax ]. The initial state of the 
defender is given by 

x(0) = r d cos(8 d ), y(0) = r d sm(0 d ) 

x(0) = u d cos(0d), y(0) = «dsin(0 d ). (20) 

For the instances generated in this paper, we set Rdz = 2.0 and take 
the parameters from the following intervals: r a £ [7.5,15.0], v d = 1.0, rd £ 
[V2R dz , 2V2R dz ], and v d £ [0.5, 1.0]. In Sectional we study the RDTA prob- 
lem with variations in the velocity parameters v a and w™ ax . 

5.2 Average case computational complexity 

In this section, we present the results of an average case computational com- 
plexity study on the branch and bound solver. A particular problem instance is 
considered solved when the strategy that minimizes the cost is found. In Fig- 
urc llll we plot the fraction of instances solved versus computation time. In the 
figure on top, the cost function is the number of attackers that enter the Defense 
Zone (e = in equation (|10f> ). Solving these instances becomes computationally 
intensive for modest size problems. For example, when n = 3 and m = 5, 80% 
of the instances are solved in 60 seconds or less. In the figure on bottom, in ad- 
dition to the primary component of the cost function, the cost function includes 
a secondary component (e = 0.01 in equation l|10(l '). The secondary component 
is the time it takes to intercept all attackers that can be intercepted. Solving 
these instances of the problem is more computationally intensive than the e = 
case. For example, when n = 3 and m = 5, only 40% of the problems are solved 
in 60 seconds or less. 

The increase in average case computational complexity for the e > case 
is expected because the cost function has an additional component to be min- 
imized, which is independent of the primary component. In a case where the 
primary component is at a minimum, the algorithm will proceed until it proves 
that the combination of primary and secondary components is minimized. 

If it is given enough time, the branch and bound solver finds the optimal 
assignment, but the average case computational complexity is high. Therefore, 
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Figure 11: The fraction of instances solved versus computation time for the 
branch and bound solver. On top, the cost is the number of attackers that enter 
the Defense Zone (e = in equation IjlUI) ). and on bottom, the cost includes a 
secondary component (e = 0.01 in equation (llOfl ). For each curve, 400 random 
instances of the RDTA problem were solved. The values of the parameters are 
n = 3 and m = 3, 4, 5. 
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Figure 12: The average convergence rate for the branch and bound solver using 
each of the three branching routines BFS, DFS, and A* search. We plot the 
percent difference from optimal PD(fc) versus the number of branches k explored. 
For each curve 400 random instances of RDTA were solved. The parameters 
values are e = 0.01, n = 3, and m = 5. 

using the algorithm to solve for the optimal assignment in real-time is infeasible 
for most applications. However, the best assignment found in the allotted time 
window for planning could be used in place of the optimal assignment. In 
this case, it is desirable that the algorithm converge to a near-optimal solution 
quickly. 

To learn more about the convergence rate of the branch and bound solver, 
we look at the rate at which the best upper bound J^ s * decreases with branches 
taken in the search tree. Because the branch and bound algorithm is an exact 
method, J^ st eventually converges to J op t- Wc define the percent difference 

from optimal as follows: Let J^l be the optimal cost for instance i. Let J^(k) 

be the best upper bound found after k branches for instance i. Let J op t be 

the mean of the set { jQ : i — 1, . . . , N}, and let J u b(k) be the mean of the 

set {J^ij(k) : i = 1, . . ., N}, where N is the number of instances. The percent 
difference from optimal is given by 



In Figure ^| we plot PD(fc) versus the number of branches (k) for instances 
involving three defenders (n — 3) and five attackers (m = 5). At the root node 
(k = 1), the greedy algorithm is applied. Exploration of the tree does not occur 
at this point. Therefore, the three branching routines produce the same result, 



PD(1) = 33%. This means that J u b(l) — J op t = 0.33J opt , or J U &(1) = 1.33J opt . 



PD(fc) = 100 




(21) 



J opt 
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In other words, the average cost of the assignment generated by the greedy 
algorithm is 1.33 times the average optimal cost. At one branch into the tree 
(k = 2), both DFS and BFS generate assignments with PD(2) = 28%, and the 
A* search generates assignments with PD(2) = 5%. Therefore, after only two 
steps, the branch and bound algorithm using A* search generates an assignment 
that, on average, has cost only 1.05 times the cost of the optimal assignment. 

For the instances solved here, the branch and bound solver with A* search 
converges to the optimal assignment in an average of 8 branches, and it takes 
an average of 740 branches to prove that the assignment is optimal. Therefore, 
the solver converges to the optimal solution quickly, and the computational 
complexity that we observed (Figure lllfl is due to the time needed to prove 
optimality. 

These results are encouraging for real-time implementation of the algorithm. 
The results show that a very good assignment is generated after a short number 
of branches. There is a trade-off between optimality and computation time that 
can be tuned by deciding how deep into the tree to explore. Going deeper into 
the tree will generate assignments that are closer to optimal, but at the same 
time, results in an increased computational burden. The parameter to be tuned 
is the maximum number of branches to allow the search procedure to explore, 
denoted kMax. 

To study the computational complexity as kMax is tuned, we look at versions 
of the algorithm (using A*) with kMax = 1 (greedy algorithm), kMax = 2, and 
kMax = oo (exact algorithm). These three cases generate assignments with 
average percent difference from optimal given by PD(1)=33%, PD(2)=5%, and 
PD(oo)=0% respectively. The results are shown in Figure H~3l The algorithm 
with kMax = 2 gives a good balance between optimality and computation time. 

5.3 Phase Transitions 

The RDTA problem is NP-hard , which can be shown by reduction using the 
traveling salesman problem. This is a worst case result that says nothing about 
the average case complexity of the algorithm or the complexity with parameter 
variations. In this section, we study the complexity of the RDTA problem as 
parameters are varied. We perform this study on the decision version of the 
problem. 

RoboFlag Drill Decision Problem (RDD): Given a set of defenders V and a 
set of attackers A, is there a complete assignment such that no attacker enters 
the Defense Zone? 

First, we consider variations in the ratio of attacker velocity to maximum 
defender velocity, denoted vA/vD in this section. When the ratio is small, the 
defenders are much faster than the attackers. It should be easy to quickly find an 
assignment such that all attackers are intercepted. When the ratio is large, the 
attackers are much faster than the defenders. In this case, it is difficult for the 
defenders to intercept all of the attackers, which should be easy to determine. 

The interesting question is whether there is a transition from being able to 
intercept all the attackers (all yes answers to the RDD problem) to not being 
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Figure 13: The fraction of instances solved versus computation time for the 
branch and bound solver (using A*) with kMax = 1, 2, and oo. The kMax vari- 
able controls the maximum number of branches explored. We vary it from 
kMax = 1, which is a greedy search, to kMax = oo, which is exhaustive search. 
For each curve, 400 random instances of the RDTA problem was solved. For 
these problems the parameter values are e = 0.01, n — 3, and m = 5. 

able to intercept all attackers (all no answers to the RDD problem). Is this 
transition sharp? Are there values of the ratio for which solving the RDD is 
difficult? 

For each value of the velocity ratio, we generated random instances of the 
RDD problem and solved them with the branch and bound solver. The results 
are shown in Figure 1141 The figure on top shows the fraction of instances that 
evaluate to yes versus the velocity ratio. The figure on bottom shows the mean 
number of branches required to solve an instance versus the velocity ratio. There 
is a sharp transition from all instances yes to all instances no. This transition 
occurs approximately at vA/vD — 1 for the n = 3, m = 5 case. At this value 
of the ratio, there is a spike in computational complexity. This easy-hard-easy 
behavior is indicative of a phase transition 125j . 

We also study the RDD problem with variations in the ratio of defenders 
to attackers, denoted n/m, with vD = vA = 1. For small values of n/m, the 
number of attackers is much larger than the number of defenders, and it should 
be easy to determine that the team of defenders cannot intercept all of the 
attackers. In this case, most instances should evaluate to no. For large values 
of n/m, the number of defenders is much larger than the number of attackers, 
and it should be easy to find an assignment in which all attackers arc denied 
from the Defense Zone. In this case, most instances should evaluate to yes. The 
results are shown in Figure where it is clear that our expectations proved 
correct. In between the extremes of the n/m ratio, there is a phase transition 
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Figure 14: The phase transition of the RDD problem in the ratio of attacker 
velocity to maximum defender velocity (vA/vD). The figure on top shows the 
fraction of instances that evaluate to yes versus the velocity ratio. The figure 
on bottom shows the mean number of branches needed to solve the problem 
versus the velocity ratio. The phase transition occurs at a velocity ration of 
approximately 1. For each curve, 100 random instances of the RDD problem 
were solved. In these figures, n = 3. 
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Figure 15: The phase transition of the RDD problem in the ratio of defenders 
to attackers (n/m). The solid line shows the fraction of instances that evaluate 
to yes versus the ratio. The dashed line shows the mean number of branches 
needed to solve the problem versus the ratio. For each curve, 100 random 
instances of the RDD problem were solved. The velocities are vD = vA = 1. 

at a ratio of approximately n/m = 0.65. 

In general, these experiments show that when one side dominates the other 
(in terms of the number of vehicles or in terms of the capabilities of the vehicles) 
the RDD problem is easy to solve. When the capabilities are comparable (similar 
numbers of vehicles, similar performance of the vehicles), the RDD is much 
harder to solve. This behavior is similar to the complexity of balanced games 
like chess ^B] . In Section [7J we discuss how knowledge of the phase transition 
can be exploited to reduce computational complexity. 

6 Multi-level implementation 

Now that we have a fast solver that generates near-optimal assignments, we 
test it in a dynamically changing environment. We consider the RoboFlag Drill 
problem with attackers that have a simple noncooperative strategy built in, 
which is unknown to the defenders. The hope is that frequent replanning, at all 
levels of the hierarchical decomposition, will mitigate our assumption that the 
attackers move with constant velocity. 

We use a multi-level receding horizon architecture, shown in Figure 1161 to 
generate the defenders' strategy. The task assignment module at the top level 
implements the branch and bound algorithm presented in this paper. It gen- 
erates the assignment a<j for each defender d, sending new assignments to the 
middle level of the hierarchy at the rate Rta- Therefore, the algorithm returns 
the best assignment computed in the time window 1/Rta- 

There is a task completion module for each defender at the middle level of the 
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Figure 16: The multi-level architecture for the defending vehicles used in our 
implementation. 



hierarchy, which receives an updated assignment ad from the task assignment 
module at the rate Rta- At a rate Rtc, the task completion module generates a 
trajectory from defender cf s current state to a point that will intercept attacker 
0^(1) assuming the attacker moves at constant velocity. If attacker a<j(l) is 
intercepted, a trajectory to intercept attacker 0^(2) is generated, and so on. 

The vehicle module at the bottom of the hierarchy receives an updated 
trajectory from the task completion module at the rate Rtc- The module 
propels the vehicle along this trajectory until it receives an update. 

The attackers are taken to be the same vehicles as the defenders (described 
in Appendix 0. For the attacker intelligence, we use the architecture shown 
in Figure El The levels of the hierarchy are decoupled, so each attacker acts 
independently. The simple intelligence for each attacker is contained in the top 
level of the hierarchy. The primary objective is to arrive at the origin of the field 
in minimum time. However, the attacker tries to avoid the defenders if they get 
too close. The radius of each defender is artificially enlarged by a factor (3 > 1. If 
the artificially enlarged defenders obstruct an attacker's path toward the origin, 
the attacker treats them as obstacles, finding a destination that results in an 
obstacle free path. The destination is found using a simple reactive obstacle 
avoidance routine used in RoboCup |SJ|2Z1. The attacker intelligence module 
runs at the rate R[. 

The trajectory generation module at the middle of the hierarchy receives an 
updated destination at the rate i?/. The module generates a trajectory from 
the current state of the attacker to the destination with zero final velocity at 
the rate Rtc, using techniques from \27\. The vehicle module at the bottom 
level of the hierarchy is the same as that for the defenders. 

Because the algorithms are more computationally intensive for the higher 
levels of the hierarchy than the lower levels, the rates are constrained as fol- 
lows: Rta < Rtc and Ri < Rtc- In the simulations that follow, we take 
Rtc = Rtc because the middle levels of the two hierarchies are comparative 
computationally. We also set Ri = Rtg/1Q- Therefore, if the trajectory gener- 
ation module replans every time unit, the attacker intelligence module replans 
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Figure 17: The multi-vehicle architecture for the attackers used to test the 
defender architecture. 

every ten time units. 

First, note that when both Rta and Rtc are zero, there is no replanning. 
In this case, all attackers usually enter the Defense Zone. They easily avoid the 
defenders because the defenders execute a fixed plan, which becomes obsolete 
once the attackers start using their intelligence. 

Next, we present simulation results of the RoboFlag Drill with intelligent 
attackers and defenders. We consider problems with eight defenders (n = 8), 
four attackers (n = 4), vA = vD, R T c — Rtg > 0, and Ri = Rtg/10- We 
consider several different values of the rate at which the task assignment module 
replans (Rta)- For each value, we solve 200 randomly generated instances of 
the problem. As an evaluation metric, we use the average number of attackers 
that enter the Defense Zone during play. 

For the case Rta — 0, there is no replanning at the task assignment level. 
Replanning only occurs at the task completion level. The defenders are given a 
plan from the task assignment module at the beginning of play. Each defender 
executes its assignment throughout, periodically recalculating the trajectory it 
must follow to intercept the next attacker in its sequence. For this case, on 
average, 58% of the attackers enter the Defense Zone during play. 

For the case Rta > 0, replanning occurs at both the task assignment level 
and the task completion level of the hierarchy. In addition to recomputing 
trajectories to intercept the next attacker in each defender's assignment, the 
defender assignments are recomputed. This redistributes tasks based on the 
current state of the dynamically changing environment, providing feedback. For 
Rta = Rtc I '40, i? T c/20, and R TC /15, an average of 38%, 34%, and 32.5% 
of the attackers enter the Defense Zone during play, respectively. Therefore, 
replanning at the task assignment level has helped increase the utility of the 
strategies generated for the team of defenders. 

In Figure IT51 we show snapshots of an instance of the RoboFlag Drill sim- 
ulation for the case where the defenders do not replan at the task assignment 
level. In this case, all attackers enter the Defense Zone. In Figure ^3 we show 
snapshots of the same instance of the RoboFlag Drill simulation, but in this 
case, the defenders replan at the task assignment level (Rta = iZrc/15). The 
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Figure 18: Snapshots of the RoboFlag Drill simulation with defender replanning 
at the task completion level only (Rta — 0). In this case, all attackers enter 
the Defense Zone. The large circles are the defenders, and the small circles are 
the attackers. The solid lines are trajectories. Each cross connected to a dashed 
line is an attacker's desired destination. 



defenders cooperate to deny all attackers from the Defense Zone. For example, 
the two defenders at the lower left of the field cooperate to intercept an attacker. 



7 Discussion 

We developed a decomposition approach that generates cooperative strategies 
for multi-vehicle control problems, and we motivated the approach using an 
adversarial game called RoboFlag. In the game, we fixed the strategy for one 
team and used our approach to generate strategies for the other team. By 
introducing a set of tasks to be completed by the team and a task completion 
method for each vehicle, we decomposed the problem into a high level task 
assignment problem and a low level task completion problem. We presented 
a branch and bound solver for task assignment, which uses upper and lower 
bounds on the optimal assignment to prune the search space. The upper bound 
algorithm is a greedy algorithm that generates feasible assignments. The best 
greedy assignment is stored in memory during the search, so the algorithm can 
be stopped at any point in the search and a feasible assignment is available. 
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Figure 19: Snapshots of the RoboFlag Drill simulation with defender replan- 
ning at the task completion level and task assignment level (Rta = -Rtc/15). 
Because there is replanning at both levels, the defenders cooperate to intercept 
all attackers. The large circles are the defenders, and the small circles are the 
attackers. The solid lines are trajectories. Each cross connected to a dashed 
line is an attacker's desired destination. 
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In our computational complexity study, we found that solving the task as- 
signment problem is computationally intensive, which was expected because the 
problem is NP-hard. However, we showed that the solver converges to the op- 
timal assignment quickly, and takes much more time to prove the assignment is 
optimal. Therefore, the solver can be run in a time window to generate near- 
optimal assignments for real-time multi- vehicle strategy generation. To increase 
the speed of the algorithm, it may be advantageous to distribute the computa- 
tion over the set of vehicles j3D], taking advantage of the distributed structure 
of the problem. 

We also studied the computational complexity of the solver as parameters 
were varied. We varied the ratio of the maximum velocities of the opposing 
vehicles, and we varied the ratio of the number of vehicles per team. We found 
that when one team has a capability advantage over the other, such as a higher 
maximum velocity or more vehicles, the solution to the task assignment prob- 
lem is easy to generate. However, when the teams are comparable in capability, 
finding the optimal assignment to the problem is much more computationally 
intensive. This type of analysis can help in deciding how many vehicles to de- 
ploy in an adversarial game and what capabilities the vehicles should have. In 
addition, knowledge of the phase transition may be exploited to reduce com- 
putational complexity. In |311 I32|. phase transition 'backbones' are exploited 
to decompose combinatorial problems into many separate subproblems, which 
are much less computationally intensive. This decomposition is amenable to 
parallel computation. In |15|. it is shown that the hardness of a problem de- 
pends on the parameters of the problem (as we showed above) and the details 
of the algorithm used to solve the problem. Therefore, it is possible that the 
hard instances of our problem, which lie along the phase transition, may be 
solved faster if we use a different solution algorithm. The authors in ^H] sug- 
gest adding randomization to the algorithm and using a rapid restart policy. 
The restart policy selects a new random seed for the algorithm and restarts it 
if the algorithm is not making sufficient progress with the current seed. 

Finally, we demonstrated the effectiveness of our approach in an environment 
where the adversaries had a noncooperative intelligence that was unknown. We 
found that the simple model used for the adversaries in the solver could be 
mitigated by a multi-level replanning architecture. In this architecture, there 
are two levels: low level task completion and high level task assignment. When 
replanning does not occur at either level, the solver fails because it generates 
a plan that becomes obsolete as the adversaries use their intelligence. When 
replanning occurs at the task completion level, an assignment is generated once 
by the solver. As the adversaries use their intelligence, the task completion 
component is run periodically for each vehicle, generating a new trajectory to 
complete the tasks in the vehicle's assignment. This was somewhat effective 
at handling the unknown intelligence. When replanning occurs at both lev- 
els, the task assignment component is run periodically in addition to the task 
completion component. We found this replanning architecture effective at rc- 
tasking in the dynamically changing environment. It is advantageous to replan 
frequently, on average, but there are instances where replanning frequently is 
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not advantageous. In these cases, the vehicles are retasked so frequently that 
their productivity is reduced. Therefore, it may be desirable to place a penalty 
on changing each vehicle's current task. 

In general, we feel the multi-level replanning approach is a natural way to 
handle multi-vehicle cooperative control problems. There are many different 
directions for further research, including the addition of a high level learning 
module to generate better models of the adversaries through experience |38) . 



A Vehicle Dynamics 



The wheeled robots of Cornell's RoboCup Team [37] are the defenders in the 
RoboFlag problems we consider in this paper. We state their governing equa- 
tions and simplify them by restricting the allowable control inputs |27| . The re- 
sult is a linear set of governing equations coupled by a nonlinear constraint on the 
control input. This procedure allows real-time calculation of many near-optimal 
trajectories and has been successfully used by Cornell's RoboCup team |57II27| . 

Each vehicle has a three-motor omni-directional drive which allows it to 
move along any direction irrespective of its orientation. This allows for superior 
maneuverability compared to traditional nonholonomic (car-like) vehicles. The 
nondimcnsional governing equations for each vehicle are given by 



x(t) 

m 
§( t ) 



x(t) 

m 

2mL 2 



6{t) 



= u(0(t),t), 



(22) 



where (x(t),y(t)) are the coordinates of the robot on the playing field, 9(t) is 
the orientation of the robot, and u(6(t),t) = P(6(t))XJ(t) can be thought of as 
a #(i)-dependent control input, where 
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(24) 



In the equations above, m is the mass of the vehicle, J is the vehicle's moment 
of inertia, L is the distance from the drive to the center of mass, and Ui(t) is 
the voltage applied to motor i. 

By restricting the admissible control inputs we simplify the governing equa- 
tions in a way that allows near-optimal performance. The set of admissible 
voltages U is given by the unit cube and the set of admissible control inputs 
is given by P{9)U. The restriction involves replacing the set P(6)U with the 
maximal ^-independent set found by taking the intersection of all possible sets 
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of admissible controls. This set is characterized by the inequalities 



U X {tf +Uy(t) 2 < 



M*)l 



(25) 



and 



\ug(t)\ < 3, (26) 

where the 9- independent control is given by (u x (t),u y (t),u z (t)). The equations 
of motion become 



x(t) 

m 



±{t) 
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u x (t) 

Uy(t) 
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(27) 



subject to constraints (|25() and (|26|l . which couple the degrees of freedom. To 
decouple the 9 dynamics we set |«6>(i)| < 1- Then constraint (|2*5|) becomes 



i x (t) 2 + u y (t) 2 < 1. 



(28) 



Now the equations of motion for the translational dynamics of the vehicle are 
given by 

x(t) + x(t) = u x {t) 

y(t)+y(t)=u y (t), (29) 
subject to constraint J5SJ. In state space form we have 

x(t) = A c x(t)+B c ii(t), (30) 
where x = (x, y, x, y) is the state and u = {u x , u y ) is the control input. 
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